This is information on the R we are using.
sessionInfo()
## R version 3.3.1 Patched (2016-10-19 r71540)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X Yosemite 10.10.5
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 tools_3.3.1 htmltools_0.3.5 yaml_2.1.13
## [5] Rcpp_0.12.6 stringi_1.1.1 rmarkdown_1.0 knitr_1.14
## [9] stringr_1.1.0 digest_0.6.10 evaluate_0.9
For our project we examined four different data sets found on the data.gov website. The data sets we examined were of Austin Salaries, Parish Salaries, and Austin Crime data from 2014 and 2016. In order to have the data ready to be imported into our SQL database, we scrubbed the data of all non-readible characters that SQL would reject.
URLs for data sets: Austin Employee data: https://catalog.data.gov/dataset/city-of-austin-employee-detail-information Baton Rouge Employee data: https://catalog.data.gov/dataset/city-parish-employees Austin Crime 2014 data: https://catalog.data.gov/dataset/annual-crime-2014 Austin Crime 2016 data: http://us-city.census.okfn.org/dataset/crime-stats
Process
1. In the ETL file, we read in the csv file into R Studio and retrived the names of the columns of the data set.
2. We removed or replaced non-readable characters for each of the names.
3. Replaced “.” with “_“.
4. Replaced”-" with " “.
5. Replaced”&" with “AND”.
6. Replaced “/” with “OR”.
7. Reformatted the csv file with the revised name of the column.
8. Printed out SQL code to make the table for our data set in order to upload it to Oracle.
Below are the four different ETL files we created:
Austin, Texas Salaries:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austin_salaries_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austin_salaries_ETL.R", echo = TRUE)
##
## > require(tidyr)
## Loading required package: tidyr
##
## > require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## > require(ggplot2)
## Loading required package: ggplot2
##
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
##
## > file_path <- "austin_salaries.csv"
##
## > salaries <- read.csv(file_path, stringsAsFactors = FALSE)
##
## > df <- rename(salaries, first_name = First, last_name = Last)
##
## > for (n in names(df)) {
## + df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]",
## + replacement = ""))
## + }
##
## > str(df)
## 'data.frame': 15521 obs. of 29 variables:
## $ Emp.ID : Factor w/ 15521 levels "10000003","10000008",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ last_name : Factor w/ 7088 levels "Aalbers","Aarniokoski",..: 6477 4347 2504 6903 6817 940 4036 3884 6947 5390 ...
## $ first_name : Factor w/ 3379 levels "A J","Aakash",..: 858 151 1451 2909 1255 1596 1451 136 1236 2190 ...
## $ MI : Factor w/ 2942 levels "","A","A M","A.",..: 645 1425 265 167 2805 1876 70 857 871 1767 ...
## $ Department : Factor w/ 37 levels "11E","15E","16E",..: 34 32 30 34 32 32 32 32 36 32 ...
## $ Department.Name : Factor w/ 36 levels "Animal Services",..: 20 32 23 20 32 32 32 32 15 32 ...
## $ Division.No : Factor w/ 504 levels "1.10E+01","1.10E+101",..: 469 440 353 461 440 440 440 440 501 440 ...
## $ Division.Title : Factor w/ 484 levels "AAQL","Accounting",..: 477 400 278 240 400 400 400 400 303 400 ...
## $ Effective.Date : Factor w/ 651 levels "1/1/2012","1/1/2015",..: 626 173 480 608 413 48 440 626 574 52 ...
## $ Length.of.Service.w.City: Factor w/ 42 levels "0","1","10","11",..: 8 11 16 20 10 15 9 8 11 6 ...
## $ Length.of.Service.in.Job: Factor w/ 26 levels "0","1","10","11",..: 9 13 10 4 20 4 13 2 24 23 ...
## $ Staffing.Level : Factor w/ 5 levels "1","2","3","4",..: 2 3 2 1 2 1 3 1 2 1 ...
## $ Staffing.Level.Desc : Factor w/ 6 levels "","Council Appointed",..: 5 4 5 6 5 6 4 6 5 6 ...
## $ Posn : Factor w/ 12106 levels "100001","100002",..: 4540 2359 6528 3454 2468 5123 2377 2495 4390 8195 ...
## $ Title : Factor w/ 1168 levels "727","911 Call Taker",..: 1095 822 571 18 829 823 826 823 417 828 ...
## $ Job.Status : Factor w/ 3 levels "A","B","L": 1 1 1 1 1 1 1 1 1 1 ...
## $ Employee.Status : Factor w/ 4 levels "A","B","L","T": 1 1 1 1 1 1 1 1 1 1 ...
## $ ECLS.Code : Factor w/ 21 levels "1","12","13",..: 9 8 9 1 15 15 8 15 13 15 ...
## $ ECLS.Desc : Factor w/ 21 levels "City Councilmembers",..: 17 13 17 18 14 14 13 14 4 14 ...
## $ Job.FTE : Factor w/ 28 levels "0.05","0.125",..: 23 23 23 23 23 23 23 23 23 23 ...
## $ Job.Hrs.Pay : Factor w/ 24 levels "10","106","12",..: 21 21 21 21 21 21 21 21 22 21 ...
## $ Hourly.Rate : Factor w/ 2566 levels "$0.00","$10.00",..: 1337 2501 2141 735 2311 2215 2441 2104 2007 1715 ...
## $ Annual.Salary : Factor w/ 2879 levels "$0.00","$1,560.00",..: 1936 355 2817 1310 124 24 274 2780 2776 2343 ...
## $ EEO.Code : Factor w/ 8 levels "10","20","30",..: 2 2 2 6 4 4 4 4 2 4 ...
## $ EEO.Desc : Factor w/ 8 levels "Admin/Supp","Official/Adm",..: 4 4 4 1 5 5 5 5 4 5 ...
## $ Age : Factor w/ 75 levels "0","14","15",..: 38 34 38 45 28 33 38 29 43 27 ...
## $ Ethnicity : Factor w/ 7 levels "","American Indian/Aleutian",..: 7 7 7 5 7 7 7 7 7 7 ...
## $ Ethnicity.Code : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 3 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 2 2 1 2 1 2 1 2 2 ...
##
## > str(salaries)
## 'data.frame': 15521 obs. of 29 variables:
## $ Emp.ID : int 10000003 10000008 10000015 10000022 10000026 10000039 10000066 10000084 10000127 10000132 ...
## $ Last : chr "Turner" "Michael" "Greco" "Wiswell-DeCampo" ...
## $ First : chr "Diane" "Andrew" "James" "Sherryl" ...
## $ MI : chr "Dallas" "Joseph" "B" "Ann" ...
## $ Department : chr "91G" "87G" "85G" "91G" ...
## $ Department.Name : chr "Health & Human Services" "Police" "Library" "Health & Human Services" ...
## $ Division.No : chr "91G427" "87G010" "85G100" "91G100" ...
## $ Division.Title : chr "Womens,Infant,Children (WIC)" "Police Law Enforcement" "Lib Information Systems" "HHSD Office of the Director" ...
## $ Effective.Date : chr "9/21/2014" "12/20/2014" "6/28/2015" "8/9/2015" ...
## $ Length.of.Service.w.City: int 15 18 22 26 17 21 16 15 18 13 ...
## $ Length.of.Service.in.Job: int 16 2 17 11 3 11 2 1 7 6 ...
## $ Staffing.Level : int 2 3 2 1 2 1 3 1 2 1 ...
## $ Staffing.Level.Desc : chr "Supervisor" "Manager" "Supervisor" "Worker" ...
## $ Posn : chr "105664" "102924" "108449" "104243" ...
## $ Title : chr "Supv, WIC" "Police Commander" "IT Supervisor Sr" "Administrative Specialist" ...
## $ Job.Status : chr "A" "A" "A" "A" ...
## $ Employee.Status : chr "A" "A" "A" "A" ...
## $ ECLS.Code : int 2 19 2 1 3 3 19 3 23 3 ...
## $ ECLS.Desc : chr "Standard/Exempt" "Police/Exempt" "Standard/Exempt" "Standard/Non-Exempt" ...
## $ Job.FTE : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Job.Hrs.Pay : int 80 80 80 80 80 80 80 80 84 80 ...
## $ Hourly.Rate : chr "$30.10" "$70.24" "$46.32" "$21.96" ...
## $ Annual.Salary : chr "$62,608.00" "$146,101.28" "$96,345.60" "$45,676.80" ...
## $ EEO.Code : int 20 20 20 60 40 40 40 40 20 40 ...
## $ EEO.Desc : chr "Professionals" "Professionals" "Professionals" "Admin/Supp" ...
## $ Age : int 50 46 50 57 40 45 50 41 55 39 ...
## $ Ethnicity : chr "White" "White" "White" "Hispanic" ...
## $ Ethnicity.Code : int 1 1 1 3 1 1 1 1 1 1 ...
## $ Gender : chr "F" "M" "M" "F" ...
##
## > measures <- c("Length.of.Service.w.City", "Length.of.Service.in.Job",
## + "Job.Hrs.Pay", "Age", "Hourly.Rate", "Annual.Salary", "Emp.ID",
## + .... [TRUNCATED]
##
## > dimensions <- setdiff(names(df), measures)
##
## > dimensions
## [1] "last_name" "first_name" "MI"
## [4] "Department" "Department.Name" "Division.No"
## [7] "Division.Title" "Effective.Date" "Staffing.Level.Desc"
## [10] "Posn" "Title" "Job.Status"
## [13] "Employee.Status" "ECLS.Code" "ECLS.Desc"
## [16] "Job.FTE" "EEO.Code" "EEO.Desc"
## [19] "Ethnicity" "Ethnicity.Code" "Gender"
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Effective.Date")
## + df[d] <- as.Date .... [TRUNCATED]
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED]
##
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv",
## + sep = ""), row.names = FALSE, na = "")
##
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]",
## + "", gsub(".csv", "", file_path)))
##
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Effective.Date")
## + sql <- paste(sql .... [TRUNCATED]
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + if (m != tail(measures, n = 1))
## + sql <- paste(sq .... [TRUNCATED]
##
## > sql <- paste(sql, ");")
##
## > cat(sql)
## CREATE TABLE austin_salaries (
## -- Change table_name to the table name you want.
## last_name varchar2(4000),
## first_name varchar2(4000),
## MI varchar2(4000),
## Department varchar2(4000),
## Department.Name varchar2(4000),
## Division.No varchar2(4000),
## Division.Title varchar2(4000),
## Effective.Date date,
## Staffing.Level.Desc varchar2(4000),
## Posn varchar2(4000),
## Title varchar2(4000),
## Job.Status varchar2(4000),
## Employee.Status varchar2(4000),
## ECLS.Code varchar2(4000),
## ECLS.Desc varchar2(4000),
## Job.FTE varchar2(4000),
## EEO.Code varchar2(4000),
## EEO.Desc varchar2(4000),
## Ethnicity varchar2(4000),
## Ethnicity.Code varchar2(4000),
## Gender varchar2(4000),
## Length.of.Service.w.City number(38,4),
## Length.of.Service.in.Job number(38,4),
## Job.Hrs.Pay number(38,4),
## Age number(38,4),
## Hourly.Rate number(38,4),
## Annual.Salary number(38,4),
## Emp.ID number(38,4),
## Staffing.Level number(38,4)
## );
Parish, Louisiana Salaries
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/parish_salaries_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/parish_salaries_ETL.R", echo = TRUE)
##
## > require(tidyr)
##
## > require(dplyr)
##
## > require(ggplot2)
##
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
##
## > file_path <- "City-Parish_Employees.csv"
##
## > df <- read.csv(file_path, stringsAsFactors = FALSE)
##
## > names(df) <- gsub("\\.+", "_", names(df))
##
## > str(df)
## 'data.frame': 23300 obs. of 24 variables:
## $ EMPLOYEE_NUM : int 526479 513040 499447 365831 476749 449784 487112 505145 526509 526517 ...
## $ LAST_NAME : chr "HALL" "HARRISON" "THIBODEAUX" "HOUSTON" ...
## $ FIRST_NAME : chr "LOUIS" "KEITH" "SAMANTHA" "ASHLEY" ...
## $ MIDDLE_INITIAL : chr "" "D" "L" "L" ...
## $ DEPARTMENT_NUM : int 77 70 60 70 41 41 41 77 50 79 ...
## $ DEPARTMENT_NAME : chr "DEPT OF MAINTENANCE" "DEPARTMENT OF PUBLIC WORKS" "HUMAN DEVELOPMENT AND SERVICES" "DEPARTMENT OF PUBLIC WORKS" ...
## $ DIVISION_NUM : int 7702002 7031102 6044706 7070303 4120000 4120000 4120000 7702002 5050001 7901000 ...
## $ DIVISION_NAME : chr "MAINTENANCE-LANDSCAPE MAINT" "DPW-EAST LOT - STREETS & ROADS" "COMMUNITY SERVICES BLOCK GRANT-ADMIN" "SEWERAGE-WASTEWATER COLLECTION" ...
## $ JOB_CODE : int 160105 151375 110620 151750 100160 100160 100160 160105 110090 141115 ...
## $ JOB_TITLE : chr "LABORER" "MAINTENANCE WORKER II" "ADMINISTRATIVE SPECIALIST I" "COMPLAINT INVESTIGATOR" ...
## $ PAY_RANGE : int 1040 1080 1110 1110 2260 2260 2260 1040 1070 1140 ...
## $ PAY_STEP : int 5 1 3 13 10 12 10 5 2 6 ...
## $ SEX : chr "M" "M" "F" "M" ...
## $ RACE : chr "B" "B" "B" "B" ...
## $ CURRENT_HIRE_DATE : chr "07/11/2016" "09/22/2014" "12/26/2012" "11/24/2004" ...
## $ EMPLOYMENT_END_DATE : chr "" "12/19/2014" "" "" ...
## $ YEARS_SERVICE : int 0 0 3 11 5 7 5 0 0 0 ...
## $ SCHEDULED_HOURS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LONGEVITY_PERCENTAGE: num 0 0 0 0 0 0 0 0 0 0 ...
## $ HOURLY_RATE : num 9.3 9.79 12.26 16.48 31.35 ...
## $ TOTAL_HOURLY_RATE : num 9.3 9.79 12.26 16.48 31.35 ...
## $ OVERTIME_HOURLY_RATE: num 14 14.7 18.4 24.7 47 ...
## $ ANNUAL_SALARY : chr "$19351.02" "$20359.04" "$25503.14" "$34273.98" ...
## $ EMPLOYMENT_STATUS : int 0 5 0 0 0 0 0 0 0 0 ...
##
## > measures <- c("YEARS_SERVICE", "TOTAL_HOURLY_RATE",
## + "ANNUAL_SALARY", "YEARS_SERVICE")
##
## > for (n in names(df)) {
## + df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]",
## + replacement = ""))
## + }
##
## > dimensions <- setdiff(names(df), measures)
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + df[d] <- data.frame(lapply(df[d], gsub, pattern = "[\"']", ..." ... [TRUNCATED]
##
## > library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED]
##
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv",
## + sep = ""), row.names = FALSE, na = "")
##
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]",
## + "", gsub(".csv", "", file_path)))
##
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + sql <- paste(sql, paste(d, "varchar2(4000),\n"))
## + }
## + .... [TRUNCATED]
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + if (m != tail(measures, n = 1))
## + sql <- paste(sq .... [TRUNCATED]
##
## > sql <- paste(sql, ");")
##
## > cat(sql)
## CREATE TABLE CityParish_Employees (
## -- Change table_name to the table name you want.
## EMPLOYEE_NUM varchar2(4000),
## LAST_NAME varchar2(4000),
## FIRST_NAME varchar2(4000),
## MIDDLE_INITIAL varchar2(4000),
## DEPARTMENT_NUM varchar2(4000),
## DEPARTMENT_NAME varchar2(4000),
## DIVISION_NUM varchar2(4000),
## DIVISION_NAME varchar2(4000),
## JOB_CODE varchar2(4000),
## JOB_TITLE varchar2(4000),
## PAY_RANGE varchar2(4000),
## PAY_STEP varchar2(4000),
## SEX varchar2(4000),
## RACE varchar2(4000),
## CURRENT_HIRE_DATE varchar2(4000),
## EMPLOYMENT_END_DATE varchar2(4000),
## SCHEDULED_HOURS varchar2(4000),
## LONGEVITY_PERCENTAGE varchar2(4000),
## HOURLY_RATE varchar2(4000),
## OVERTIME_HOURLY_RATE varchar2(4000),
## EMPLOYMENT_STATUS varchar2(4000),
## YEARS_SERVICE number(38,4)
## TOTAL_HOURLY_RATE number(38,4),
## ANNUAL_SALARY number(38,4),
## YEARS_SERVICE number(38,4)
## );
Austin, Texas Crime 2014:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime14_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime14_ETL.R", echo = TRUE)
##
## > require(tidyr)
##
## > require(dplyr)
##
## > require(ggplot2)
##
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
##
## > file_path <- "AustinCrime14.csv"
##
## > atxCrime14 <- read.csv(file_path, stringsAsFactors = FALSE)
##
## > names(atxCrime14)
## [1] "GO.Primary.Key"
## [2] "Council.District"
## [3] "GO.Highest.Offense.Desc"
## [4] "Highest.NIBRS.UCR.Offense.Description"
## [5] "GO.Report.Date"
## [6] "GO.Location"
## [7] "Clearance.Status"
## [8] "Clearance.Date"
## [9] "GO.District"
## [10] "GO.Location.Zip"
## [11] "GO.Census.Tract"
## [12] "GO.X.Coordinate"
## [13] "GO.Y.Coordinate"
## [14] "Location_1"
##
## > str(atxCrime14)
## 'data.frame': 40641 obs. of 14 variables:
## $ GO.Primary.Key : num 2.01e+10 2.01e+10 2.01e+10 2.01e+10 2.01e+10 ...
## $ Council.District : int 1 1 3 1 3 1 9 4 3 9 ...
## $ GO.Highest.Offense.Desc : chr "AGG ROBBERY/DEADLY WEAPON" "ROBBERY BY ASSAULT" "ROBBERY BY THREAT" "AGG ROBBERY/DEADLY WEAPON" ...
## $ Highest.NIBRS.UCR.Offense.Description: chr "Robbery" "Robbery" "Robbery" "Robbery" ...
## $ GO.Report.Date : chr "4/17/2014" "4/25/2014" "5/11/2014" "6/16/2014" ...
## $ GO.Location : chr "12151 N IH 35 SVRD NB" "3300 BLOCK ROCKHURST LN" "E 7TH ST / CHICON ST" "WHELESS LN / BERKMAN DR" ...
## $ Clearance.Status : chr "N" "N" "N" "C" ...
## $ Clearance.Date : chr "4/28/2014 0:00:00" "5/20/2014 0:00:00" "5/13/2014 0:00:00" "3/24/2015 0:00:00" ...
## $ GO.District : chr "E" "I" "C" "I" ...
## $ GO.Location.Zip : int 78753 78723 78702 78723 78702 78722 78701 78753 78741 78701 ...
## $ GO.Census.Tract : num 18.35 21.13 9.02 21.04 9.02 ...
## $ GO.X.Coordinate : int 3135985 3137985 3120890 3130566 3117732 3122536 3114310 3127097 3127162 3116520 ...
## $ GO.Y.Coordinate : int 10117220 10087946 10068910 10089446 10068195 10075649 10070753 10097584 10055519 10070158 ...
## $ Location_1 : chr "12151 N IH 35 SVRD NB\nAustin, TX 78753\n" "3300 BLOCK ROCKHURST LN\nAustin, TX 78723\n" "E 7TH ST / CHICON ST\nAustin, TX 78702\n" "WHELESS LN / BERKMAN DR\nAustin, TX 78723\n" ...
##
## > View(atxCrime14)
##
## > df <- rename(atxCrime14, Crime_Type = GO.Highest.Offense.Desc,
## + Date_of_crime = GO.Report.Date, zip_code = GO.Location.Zip,
## + address = G .... [TRUNCATED]
##
## > str(df)
## 'data.frame': 40641 obs. of 14 variables:
## $ GO.Primary.Key : num 2.01e+10 2.01e+10 2.01e+10 2.01e+10 2.01e+10 ...
## $ Council.District : int 1 1 3 1 3 1 9 4 3 9 ...
## $ Crime_Type : chr "AGG ROBBERY/DEADLY WEAPON" "ROBBERY BY ASSAULT" "ROBBERY BY THREAT" "AGG ROBBERY/DEADLY WEAPON" ...
## $ Highest.NIBRS.UCR.Offense.Description: chr "Robbery" "Robbery" "Robbery" "Robbery" ...
## $ Date_of_crime : chr "4/17/2014" "4/25/2014" "5/11/2014" "6/16/2014" ...
## $ address : chr "12151 N IH 35 SVRD NB" "3300 BLOCK ROCKHURST LN" "E 7TH ST / CHICON ST" "WHELESS LN / BERKMAN DR" ...
## $ Clearance.Status : chr "N" "N" "N" "C" ...
## $ Clearance.Date : chr "4/28/2014 0:00:00" "5/20/2014 0:00:00" "5/13/2014 0:00:00" "3/24/2015 0:00:00" ...
## $ GO.District : chr "E" "I" "C" "I" ...
## $ zip_code : int 78753 78723 78702 78723 78702 78722 78701 78753 78741 78701 ...
## $ GO.Census.Tract : num 18.35 21.13 9.02 21.04 9.02 ...
## $ GO.X.Coordinate : int 3135985 3137985 3120890 3130566 3117732 3122536 3114310 3127097 3127162 3116520 ...
## $ GO.Y.Coordinate : int 10117220 10087946 10068910 10089446 10068195 10075649 10070753 10097584 10055519 10070158 ...
## $ Location_1 : chr "12151 N IH 35 SVRD NB\nAustin, TX 78753\n" "3300 BLOCK ROCKHURST LN\nAustin, TX 78723\n" "E 7TH ST / CHICON ST\nAustin, TX 78702\n" "WHELESS LN / BERKMAN DR\nAustin, TX 78723\n" ...
##
## > for (n in names(df)) {
## + df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]",
## + replacement = ""))
## + }
##
## > str(df)
## 'data.frame': 40641 obs. of 14 variables:
## $ GO.Primary.Key : Factor w/ 40641 levels "20141000078",..: 493 1046 2046 4271 6947 7178 8744 10165 10410 10574 ...
## $ Council.District : Factor w/ 10 levels "1","10","2","3",..: 1 1 4 1 4 1 10 5 4 10 ...
## $ Crime_Type : Factor w/ 44 levels "AGG ASLT ENHANC STRANGL/SUFFOC",..: 11 30 31 11 11 30 30 31 11 30 ...
## $ Highest.NIBRS.UCR.Offense.Description: Factor w/ 14 levels "Aggravated Assault",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Date_of_crime : Factor w/ 365 levels "1/1/2014","1/10/2014",..: 191 200 215 251 292 14 317 342 345 349 ...
## $ address : Factor w/ 17327 levels "","1 OLMOS DR",..: 2378 8045 16781 17315 17299 4733 16764 14267 4734 13160 ...
## $ Clearance.Status : Factor w/ 4 levels "","C","N","O": 3 3 3 2 3 2 3 2 3 3 ...
## $ Clearance.Date : Factor w/ 498 levels "","1/1/2014 0:00:00",..: 299 336 324 240 75 196 457 104 78 69 ...
## $ GO.District : Factor w/ 11 levels "A","AP","B","C",..: 6 10 4 10 8 4 8 6 9 8 ...
## $ zip_code : Factor w/ 48 levels "78610","78613",..: 43 17 8 17 8 16 7 43 32 7 ...
## $ GO.Census.Tract : Factor w/ 207 levels "1.01","1.02",..: 88 146 206 137 206 195 4 68 162 4 ...
## $ GO.X.Coordinate : Factor w/ 14523 levels "3057251","3058024",..: 13311 13555 9061 12109 7939 9653 6487 11115 11131 7474 ...
## $ GO.Y.Coordinate : Factor w/ 15553 levels "10010316","10010757",..: 13964 10081 6356 10328 6146 7945 6970 11408 3798 6738 ...
## $ Location_1 : Factor w/ 17369 levels "1 OLMOS DRAustin, TX 78744(30.21909, -97.6874)",..: 2378 8048 16823 17357 17341 4734 16806 14272 4735 13165 ...
##
## > measures <- c("GO.X.Coordinate", "GO.Y.Coordinate")
##
## > dimensions <- c("Date_of_crime", "address", "zip_code",
## + "Crime_Type")
##
## > dimensions
## [1] "Date_of_crime" "address" "zip_code" "Crime_Type"
##
## > View(df)
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Date_of_crime")
## + df[d] <- as.Date( .... [TRUNCATED]
##
## > View(df)
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED]
##
## > View(df)
##
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv",
## + sep = ""), row.names = FALSE, na = "")
##
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]",
## + "", gsub(".csv", "", file_path)))
##
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Date_of_crime")
## + sql <- paste(sql, .... [TRUNCATED]
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + if (m != tail(measures, n = 1))
## + sql <- paste(sq .... [TRUNCATED]
##
## > sql <- paste(sql, ");")
##
## > cat(sql)
## CREATE TABLE AustinCrime14 (
## -- Change table_name to the table name you want.
## Date_of_crime date,
## address varchar2(4000),
## zip_code varchar2(4000),
## Crime_Type varchar2(4000),
## GO.X.Coordinate number(38,4),
## GO.Y.Coordinate number(38,4)
## );
Austin, Texas Crime 2016:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime_ETL.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime_ETL.R", echo = TRUE)
##
## > require(tidyr)
##
## > require(dplyr)
##
## > require(ggplot2)
##
## > setwd("~/dv_finalproject_rabinbhattarai/01 Data")
##
## > file_path <- "AustinCrime.csv"
##
## > atxCrime <- read.csv(file_path, stringsAsFactors = FALSE)
##
## > names(atxCrime)
## [1] "Incident.Report.Number" "Crime.Type"
## [3] "Date" "Time"
## [5] "LOCATION_TYPE" "ADDRESS"
## [7] "LONGITUDE" "LATITUDE"
## [9] "State" "Country"
## [11] "Continent"
##
## > str(atxCrime)
## 'data.frame': 7015 obs. of 11 variables:
## $ Incident.Report.Number: num 2.02e+08 2.02e+10 2.02e+10 2.02e+09 2.02e+08 ...
## $ Crime.Type : chr "AGG ASSAULT" "ASSAULT INFORMATION" "THEFT" "ASSAULT INFORMATION" ...
## $ Date : chr "1/2/2016" "4/22/2016" "5/14/2016" "3/21/2016" ...
## $ Time : int 557 1800 2000 2126 1900 2300 1528 1417 0 6 ...
## $ LOCATION_TYPE : logi NA NA NA NA NA NA ...
## $ ADDRESS : chr "W MARTIN LUTHER KING JR BLVD / NUECES ST" "MANOR RD / ROGGE LN" "WHELESS LN / LINDA LN" "PROCK LN / SPRINGDALE RD" ...
## $ LONGITUDE : num -97.7 -97.7 -97.7 -97.7 -97.7 ...
## $ LATITUDE : num 30.3 30.3 30.3 30.3 30.3 ...
## $ State : chr "Texas" "Texas" "Texas" "Texas" ...
## $ Country : chr "United States of America" "United States of America" "United States of America" "United States of America" ...
## $ Continent : chr "North America" "North America" "North America" "North America" ...
##
## > View(atxCrime)
##
## > df <- rename(atxCrime, Crime_Type = Crime.Type, Incident_report_number = Incident.Report.Number)
##
## > str(df)
## 'data.frame': 7015 obs. of 11 variables:
## $ Incident_report_number: num 2.02e+08 2.02e+10 2.02e+10 2.02e+09 2.02e+08 ...
## $ Crime_Type : chr "AGG ASSAULT" "ASSAULT INFORMATION" "THEFT" "ASSAULT INFORMATION" ...
## $ Date : chr "1/2/2016" "4/22/2016" "5/14/2016" "3/21/2016" ...
## $ Time : int 557 1800 2000 2126 1900 2300 1528 1417 0 6 ...
## $ LOCATION_TYPE : logi NA NA NA NA NA NA ...
## $ ADDRESS : chr "W MARTIN LUTHER KING JR BLVD / NUECES ST" "MANOR RD / ROGGE LN" "WHELESS LN / LINDA LN" "PROCK LN / SPRINGDALE RD" ...
## $ LONGITUDE : num -97.7 -97.7 -97.7 -97.7 -97.7 ...
## $ LATITUDE : num 30.3 30.3 30.3 30.3 30.3 ...
## $ State : chr "Texas" "Texas" "Texas" "Texas" ...
## $ Country : chr "United States of America" "United States of America" "United States of America" "United States of America" ...
## $ Continent : chr "North America" "North America" "North America" "North America" ...
##
## > for (n in names(df)) {
## + df[n] <- data.frame(lapply(df[n], gsub, pattern = "[^ -~]",
## + replacement = ""))
## + }
##
## > View(df)
##
## > str(df)
## 'data.frame': 7015 obs. of 11 variables:
## $ Incident_report_number: Factor w/ 7015 levels "20161000059",..: 2072 294 4760 6640 6028 6457 754 442 6145 6962 ...
## $ Crime_Type : Factor w/ 212 levels "ABAND VEH (STORAGE FACILITY)",..: 9 25 176 25 27 179 132 132 132 176 ...
## $ Date : Factor w/ 257 levels "1/1/2016","1/10/2016",..: 12 123 145 89 26 37 149 140 98 134 ...
## $ Time : Factor w/ 1318 levels "0","1","10","100",..: 1126 525 646 743 585 857 366 284 1 1131 ...
## $ LOCATION_TYPE : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
## $ ADDRESS : Factor w/ 3116 levels "0 N MOPAC NB AT MOPAC RAILRO TRN",..: 2874 1490 3074 2003 342 733 785 418 571 561 ...
## $ LONGITUDE : Factor w/ 2336 levels "-0.0001335584",..: 1447 228 330 367 481 1013 1171 1128 900 661 ...
## $ LATITUDE : Factor w/ 2336 levels "0.0001204538",..: 1275 1529 1637 1061 1762 865 1003 682 1127 1162 ...
## $ State : Factor w/ 1 level "Texas": 1 1 1 1 1 1 1 1 1 1 ...
## $ Country : Factor w/ 1 level "United States of America": 1 1 1 1 1 1 1 1 1 1 ...
## $ Continent : Factor w/ 1 level "North America": 1 1 1 1 1 1 1 1 1 1 ...
##
## > measures <- c("LATITUDE", "LONGITUDE")
##
## > dimensions <- setdiff(names(df), measures)
##
## > dimensions
## [1] "Incident_report_number" "Crime_Type"
## [3] "Date" "Time"
## [5] "LOCATION_TYPE" "ADDRESS"
## [7] "State" "Country"
## [9] "Continent"
##
## > measures
## [1] "LATITUDE" "LONGITUDE"
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Date")
## + df[d] <- as.Date(atxCrime$ .... [TRUNCATED]
##
## > View(df)
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + df[m] <- data.frame(lapply(df[m], gsub, pattern = "[^--.0-9]", .... [TRUNCATED]
##
## > View(df)
##
## > write.csv(df, paste(gsub(".csv", "", file_path), ".reformatted.csv",
## + sep = ""), row.names = FALSE, na = "")
##
## > tableName <- gsub(" +", "_", gsub("[^A-z, 0-9, ]",
## + "", gsub(".csv", "", file_path)))
##
## > sql <- paste("CREATE TABLE", tableName, "(\n-- Change table_name to the table name you want.\n")
##
## > if (length(measures) > 1 || !is.na(dimensions)) {
## + for (d in dimensions) {
## + if (d == "Date")
## + sql <- paste(sql, paste(d, .... [TRUNCATED]
##
## > if (length(measures) > 1 || !is.na(measures)) {
## + for (m in measures) {
## + if (m != tail(measures, n = 1))
## + sql <- paste(sq .... [TRUNCATED]
##
## > sql <- paste(sql, ");")
##
## > cat(sql)
## CREATE TABLE AustinCrime (
## -- Change table_name to the table name you want.
## Incident_report_number varchar2(4000),
## Crime_Type varchar2(4000),
## Date date,
## Time varchar2(4000),
## LOCATION_TYPE varchar2(4000),
## ADDRESS varchar2(4000),
## State varchar2(4000),
## Country varchar2(4000),
## Continent varchar2(4000),
## LATITUDE number(38,4),
## LONGITUDE number(38,4)
## );
Next, we created the data frame we needed by extracting the information from SQL. Below is the code importing the dataset from Oracle to make the dataframe along with a summary and a subset of the datasets:
Parish Salaries summary and subset of the fire department who have a salary greater than $10,000:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/Parish_summary_subset.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/Parish_summary_subset.R", echo = TRUE)
##
## > require("jsonlite")
## Loading required package: jsonlite
##
## > require("RCurl")
## Loading required package: RCurl
## Loading required package: bitops
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
##
## > PARISH_SALARIES_df <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from PARISH_SALARIES\""),
## + .... [TRUNCATED]
##
## > summary(PARISH_SALARIES_df)
## EMPLOYEE_NUM LAST_NAME FIRST_NAME MIDDLE_INITIAL
## Min. : 78 WILLIAMS: 464 MICHAEL: 335 null : 2317
## 1st Qu.:324963 JOHNSON : 400 JAMES : 326 L : 2316
## Median :397632 BROWN : 269 JOHN : 305 M : 2223
## Mean :372329 SMITH : 264 ROBERT : 256 D : 2001
## 3rd Qu.:464416 JACKSON : 240 CHARLES: 216 A : 1922
## Max. :528277 JONES : 240 JOSEPH : 189 J : 1840
## (Other) :21423 (Other):21673 (Other):10681
## DEPARTMENT_NUM DEPARTMENT_NAME DIVISION_NUM
## Min. : 1.00 DEPARTMENT OF PUBLIC WORKS :4425 Min. : 120010
## 1st Qu.:40.00 HUMAN DEVELOPMENT AND SERVICES:4186 1st Qu.:4020116
## Median :51.00 POLICE DEPARTMENT :2730 Median :5130001
## Mean :47.22 LIBRARY BOARD OF CONTROL :2487 Mean :4749240
## 3rd Qu.:60.00 FIRE DEPARTMENT :1387 3rd Qu.:6055086
## Max. :89.00 EMERGENCY MEDICAL SERVICES :1328 Max. :8999999
## (Other) :6757
## DIVISION_NAME JOB_CODE
## LIBRARY BOARD OF CONTROL : 2487 Min. : 19
## POLICE DEPARTMENT-UNIFORM PATROL BUREAU : 1257 1st Qu.:105381
## FIRE DEPT-FIRE SUPPRESSION and PREVENTION: 1161 Median :151380
## DPW-STATE HIGHWAY MAINTENANCE : 1036 Mean :272848
## HEADSTART : 960 3rd Qu.:400074
## WIA YOUTH PROGRAM : 888 Max. :999999
## (Other) :15511
## JOB_TITLE PAY_RANGE PAY_STEP
## LABORER : 1823 Min. : 100 Min. : 1.000
## COLLEGE STUDENT INTERN/CONTRACT: 1788 1st Qu.:1050 1st Qu.: 1.000
## WIA SUMMER EMPLOYMENT WORKER : 1064 Median :2116 Median : 2.000
## DISPLACED DISASTER WORKER : 905 Mean :4204 Mean : 4.874
## POLICE OFFICER : 785 3rd Qu.:9999 3rd Qu.: 8.000
## LIBRARY PAGE (20 HOURS) : 601 Max. :9999 Max. :53.000
## (Other) :16334
## SEX RACE CURRENT_HIRE_DATE EMPLOYMENT_END_DATE
## F:10908 B :13628 06/07/2010: 526 null : 4577
## M:12388 W : 9334 06/01/2009: 510 08/13/2010: 554
## U: 4 A : 114 06/01/2012: 112 07/31/2009: 477
## S : 81 06/09/2016: 108 01/27/2006: 219
## O : 64 06/04/2015: 97 07/08/2016: 131
## U : 56 07/15/2013: 82 07/02/2015: 118
## (Other): 23 (Other) :21865 (Other) :17224
## YEARS_SERVICE SCHEDULED_HOURS LONGEVITY_PERCENTAGE HOURLY_RATE
## Min. : 0.000 Min. : 0.00 Min. :0.00000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.:0.00000 1st Qu.: 7.669
## Median : 1.000 Median : 0.00 Median :0.00000 Median : 9.750
## Mean : 5.149 Mean : 17.88 Mean :0.02129 Mean :12.458
## 3rd Qu.: 6.000 3rd Qu.: 0.00 3rd Qu.:0.00000 3rd Qu.:15.684
## Max. :48.000 Max. : 9999.99 Max. :0.20000 Max. :71.157
##
## TOTAL_HOURLY_RATE OVERTIME_HOURLY_RATE ANNUAL_SALARY EMPLOYMENT_STATUS
## Min. :-3.000 Min. : 0.00 Min. : 0 Min. :0.000
## 1st Qu.: 7.715 1st Qu.: 12.00 1st Qu.: 7248 1st Qu.:5.000
## Median : 9.750 Median : 15.00 Median : 16951 Median :5.000
## Mean :12.840 Mean : 20.25 Mean : 21600 Mean :4.018
## 3rd Qu.:16.502 3rd Qu.: 25.44 3rd Qu.: 31786 3rd Qu.:5.000
## Max. :76.849 Max. :115.27 Max. :148006 Max. :5.000
##
##
## > head(subset(PARISH_SALARIES_df, DEPARTMENT_NAME ==
## + "FIRE DEPARTMENT" & ANNUAL_SALARY > 10000))
## EMPLOYEE_NUM LAST_NAME FIRST_NAME MIDDLE_INITIAL DEPARTMENT_NUM
## 1 502936 EVANS JR CHARLES E 51
## 2 442259 MUNN JORDAN A 51
## 8 302503 ELAM KIMBERLY M 51
## 11 315680 LEBRANE JR CHARLES E 51
## 18 503932 REMSON GARRETT M 51
## 32 391310 MOUILLE CHRISTOPHER W 51
## DEPARTMENT_NAME DIVISION_NUM
## 1 FIRE DEPARTMENT 5120001
## 2 FIRE DEPARTMENT 5120001
## 8 FIRE DEPARTMENT 5110001
## 11 FIRE DEPARTMENT 5120001
## 18 FIRE DEPARTMENT 5120001
## 32 FIRE DEPARTMENT 5120001
## DIVISION_NAME JOB_CODE
## 1 FIRE DEPT-FIRE SUPPRESSION and PREVENTION 170105
## 2 FIRE DEPT-FIRE SUPPRESSION and PREVENTION 170105
## 8 FIRE DEPARTMENT-ADMINISTRATION 400074
## 11 FIRE DEPT-FIRE SUPPRESSION and PREVENTION 170115
## 18 FIRE DEPT-FIRE SUPPRESSION and PREVENTION 170105
## 32 FIRE DEPT-FIRE SUPPRESSION and PREVENTION 170110
## JOB_TITLE PAY_RANGE PAY_STEP SEX RACE
## 1 FIRE FIGHTER 7201 1 M B
## 2 FIRE FIGHTER 7201 12 M W
## 8 COLLEGE STUDENT INTERN/CONTRACT 9999 1 F W
## 11 FIRE CAPTAIN 7220 19 M B
## 18 FIRE FIGHTER 7201 3 M W
## 32 FIRE EQUIP OPER 7210 14 M W
## CURRENT_HIRE_DATE EMPLOYMENT_END_DATE YEARS_SERVICE SCHEDULED_HOURS
## 1 06/03/2013 06/17/2013 0 112
## 2 07/10/2006 null 0 112
## 8 05/27/2003 12/07/2006 3 58
## 11 07/16/1996 null 0 112
## 18 07/15/2013 null 3 112
## 32 07/12/2004 null 12 112
## LONGEVITY_PERCENTAGE HOURLY_RATE TOTAL_HOURLY_RATE OVERTIME_HOURLY_RATE
## 1 0 10.0364 10.0364 15.0546
## 2 0 15.0337 15.0337 25.6412
## 8 0 8.2500 8.2500 12.3750
## 11 0 23.1117 23.1117 37.7582
## 18 0 11.5227 11.5227 20.3747
## 32 0 17.5440 17.5440 29.4068
## ANNUAL_SALARY EMPLOYMENT_STATUS
## 1 29226.08 5
## 2 43778.02 0
## 8 12441.00 5
## 11 67301.26 0
## 18 33554.04 0
## 32 51088.18 0
Austin Crime 2014 summary and subset of Zip Code from 78701:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime14_subset.R", echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime14_subset.R", echo = TRUE)
##
## > require("jsonlite")
##
## > require("RCurl")
##
## > austinCrime14 <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from austincrime14\""),
## + ht .... [TRUNCATED]
##
## > summary(austinCrime14)
## DATE_OF_CRIME ADDRESS
## 2014-10-13 00:00:00: 169 1030 NORWOOD PARK BLVD : 775
## 2014-05-19 00:00:00: 165 null : 561
## 2014-03-31 00:00:00: 156 2901 S CAPITAL OF TEXAS HWY NB: 399
## 2014-01-27 00:00:00: 155 2508 E RIVERSIDE DR : 311
## 2014-07-14 00:00:00: 155 9300 S IH 35 SVRD SB : 258
## 2014-02-10 00:00:00: 153 11200 LAKELINE MALL DR : 252
## (Other) :39688 (Other) :38085
## ZIP_CODE CRIME_TYPE X_COORDINATE
## 78753 : 3935 THEFT :11040 3129997: 775
## 78741 : 3394 BURGLARY OF VEHICLE : 9993 null : 688
## 78758 : 2826 THEFT BY SHOPLIFTING : 4284 3094135: 404
## 78704 : 2715 BURGLARY OF RESIDENCE: 3980 3120959: 312
## 78745 : 2577 AUTO THEFT : 2188 3100423: 261
## 78702 : 2196 THEFT OF BICYCLE : 1493 3092524: 252
## (Other):22998 (Other) : 7663 (Other):37949
## Y_COORDINATE
## 10096983: 775
## null : 688
## 10066586: 402
## 10059177: 312
## 10033116: 260
## 10143916: 252
## (Other) :37952
##
## > head(subset(austinCrime14, ZIP_CODE == "78701"))
## DATE_OF_CRIME ADDRESS ZIP_CODE
## 38 2014-12-01 00:00:00 500 N IH 35 SVRD SB 78701
## 202 2014-08-20 00:00:00 E 6TH ST / CONGRESS AVE 78701
## 205 2014-09-21 00:00:00 700 BLOCK E 6TH ST 78701
## 209 2014-02-08 00:00:00 700 BLOCK NUECES ST 78701
## 215 2014-01-10 00:00:00 700 BLOCK E 4TH ST 78701
## 218 2014-05-16 00:00:00 700 BLOCK GUADALUPE ST 78701
## CRIME_TYPE X_COORDINATE Y_COORDINATE
## 38 BURGLARY OF RESIDENCE 3116639 10069697
## 202 ROBBERY BY ASSAULT 3114310 10070753
## 205 ROBBERY BY ASSAULT 3116520 10070158
## 209 AGG ROBBERY/DEADLY WEAPON 3112654 10071801
## 215 ROBBERY BY ASSAULT 3116340 10069527
## 218 ROBBERY BY ASSAULT 3113235 10071572
Austin Crime 2016 summary and subset for DWI crimes:
#source("C:/Users/Rabin Bhattarai/Desktop/Data Visualization/DV_FinalProject3/01 Data/austincrime_subset.R, echo = TRUE)
source("/Users/JPNJOCKII/dv_finalproject_rabinbhattarai/01 Data/austincrime_subset.R", echo = TRUE)
##
## > require("jsonlite")
##
## > require("RCurl")
##
## > austinCrime <- data.frame(fromJSON(getURL(URLencode("oraclerest.cs.utexas.edu:5001/rest/native/?query=\"select * from austincrime\""),
## + httphe .... [TRUNCATED]
##
## > summary(austinCrime)
## INCIDENT_REPORT_NUMBER CRIME_TYPE
## Min. :2.016e+08 CRASH/LEAVING THE SCENE :1910
## 1st Qu.:2.017e+09 DWI : 394
## Median :2.016e+10 LOST PROP : 344
## Mean :1.560e+10 DRIVING WHILE LICENSE INVALID: 227
## 3rd Qu.:2.017e+10 THEFT : 224
## Max. :2.016e+12 POSS MARIJUANA : 217
## (Other) :3699
## DATE_OF_CRIME TIME LOCATION_TYPE
## 2016-03-17 00:00:00: 48 Min. : 0 null:7015
## 2016-02-26 00:00:00: 47 1st Qu.: 736
## 2016-03-18 00:00:00: 47 Median :1454
## 2016-01-21 00:00:00: 46 Mean :1315
## 2016-05-07 00:00:00: 46 3rd Qu.:1917
## 2016-04-02 00:00:00: 45 Max. :2359
## (Other) :6736
## ADDRESS STATE
## E 6TH ST / TRINITY ST : 75 Texas:7015
## UNKNOWN : 73
## E 6TH ST / NECHES ST : 47
## E 6TH ST / SAN JACINTO BLVD : 39
## E RIVERSIDE DR / S PLEASANT VALLEY RD: 34
## E 7TH ST / TRINITY ST : 33
## (Other) :6714
## COUNTRY CONTINENT LATITUDE
## United States of America:7015 North America:7015 Min. : 0.0001
## 1st Qu.:30.2430
## Median :30.2690
## Mean :30.2211
## 3rd Qu.:30.3261
## Max. :30.5149
##
## LONGITUDE
## Min. :-97.9533
## 1st Qu.:-97.7489
## Median :-97.7350
## Mean :-97.5232
## 3rd Qu.:-97.7037
## Max. : -0.0001
##
##
## > head(subset(austinCrime, CRIME_TYPE == "DWI"))
## INCIDENT_REPORT_NUMBER CRIME_TYPE DATE_OF_CRIME TIME
## 30 2016270145 DWI 2016-01-27 00:00:00 356
## 157 20161959936 DWI 2016-07-14 00:00:00 422
## 296 2016950771 DWI 2016-04-04 00:00:00 1250
## 557 20161851392 DWI 2016-07-03 00:00:00 1842
## 558 20161859584 DWI 2016-07-04 00:00:00 241
## 940 20161331200 DWI 2016-05-12 00:00:00 328
## LOCATION_TYPE ADDRESS STATE
## 30 null S 1ST ST / CUMBERLAND RD Texas
## 157 null S 1ST ST / W BEN WHITE BLVD SVRD WB Texas
## 296 null DENSON DR / N LAMAR BLVD Texas
## 557 null CENTER RIDGE DR / MC CALLEN PASS Texas
## 558 null W 7TH ST / RIO GRANDE ST Texas
## 940 null N IH 35 SVRD NB / E MARTIN LUTHER KING JR BLVD Texas
## COUNTRY CONTINENT LATITUDE LONGITUDE
## 30 United States of America North America 30.2382 -97.7621
## 157 United States of America North America 30.2269 -97.7692
## 296 United States of America North America 30.3304 -97.7235
## 557 United States of America North America 30.4143 -97.6634
## 558 United States of America North America 30.2709 -97.7492
## 940 United States of America North America 30.2785 -97.7291
Shiny App Images:
Below is our first shiny app visualization. Here we used Salary data from employees in the city of Austin and divided the data into a set (Female vs Male). We wanted to visualize the departments in Austin where women earned more annually than men (and vice-versa) and we were able to accomplish this. (Code used to produce app can be found in 03 Shiny folder) === Here is our visualization of departments were Men earned more annually than Women.
=== Here, we focus on our other dataset regarding crime around Austin. This part of the shiny app lets the user choose a crime, which then produces a visual heat map of where the crime occurred in Austin. There’s also a data table at the bottom for reference to the crimes shown and not shown on the graph.
=== Here is another visulaztion showing where crime occurred in Austin. Just like the previous image, this part of the app allows the user to choose a crime and produce a visualization of the crime locations. In this case, we utilize a bin structure to present the crime locations rather than the previous heat map. —
=== This next part of shiny app utilizes the joining of two datasets (Austin crime 2014 and Austin crime 2016). Creating the join was accomplished by combining both datasets on a common address. This allowed us to see what locations in Austin had different crimes occuring at the same place. This joining mechanism also allowed us to see which zip codes had the most crime, which are shown in the bar graph below. And finally, the user can now visually see where these crimes occurred by selecting a zip code.
===
This marks the end of our shiny app, and here we go back to our Austin Salary data. The produced graphs below provide the user with both interesting data in regard to ethnicities as well as producing basic visualizations for users wanting perform statistical analysis of the data (more visulazitions will be needed.)
URL to Article: http://www.austintexas.gov/edims/document.cfm?id=207153
URL to Article: http://www.theadvocate.com/baton_rouge/news/business/article_f02a27c3-116f-5048-ad1b-9eb20b2abf01.html
The articles state that in Austin women make 83 cents on the dollar compared to men. In Baton Rouge women make 65 cents on the dollar compared to men. However from our data we found that Austin City employees that are women only make 64 cents on the dollar and in Baton Rouge women make 66 cents on the dollar. This shows that Baton Rouge is consistent with its pay for women across the state. However, the City of Austin pays far less to women compared to men than what the city averages.
===
This graph produced in Tableau shows amount of crime per zip code in Austin by shading each area either light (low crime) or dark (high crime)
===
This tableau dashboard provides a visualization comparing crime in 2014 vs 2016.
Here is a link to our Shiny Application: https://jnjock32.shinyapps.io/final/